17 research outputs found

    A framework for the construction of generative models for mesoscale structure in multilayer networks

    Get PDF
    Multilayer networks allow one to represent diverse and coupled connectivity patterns—such as time-dependence, multiple subsystems, or both—that arise in many applications and which are difficult or awkward to incorporate into standard network representations. In the study of multilayer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, such as dense sets of nodes known as communities, to discover network features that are not apparent at the microscale or the macroscale. The ill-defined nature of mesoscale structure and its ubiquity in empirical networks make it crucial to develop generative models that can produce the features that one encounters in empirical networks. Key purposes of such models include generating synthetic networks with empirical properties of interest, benchmarking mesoscale-detection methods and algorithms, and inferring structure in empirical multilayer networks. In this paper, we introduce a framework for the construction of generative models for mesoscale structures in multilayer networks. Our framework provides a standardized set of generative models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. It unifies and generalizes many existing models for mesoscale structures in fully ordered (e.g., temporal) and unordered (e.g., multiplex) multilayer networks. One can also use it to construct generative models for mesoscale structures in partially ordered multilayer networks (e.g., networks that are both temporal and multiplex). Our framework has the ability to produce many features of empirical multilayer networks, and it explicitly incorporates a user-specified dependency structure between layers. We discuss the parameters and properties of our framework, and we illustrate examples of its use with benchmark models for community-detection methods and algorithms in multilayer networks

    Community detection in temporal multilayer networks, with an application to correlation networks

    Full text link
    Networks are a convenient way to represent complex systems of interacting entities. Many networks contain "communities" of nodes that are more densely connected to each other than to nodes in the rest of the network. In this paper, we investigate the detection of communities in temporal networks represented as multilayer networks. As a focal example, we study time-dependent financial-asset correlation networks. We first argue that the use of the "modularity" quality function---which is defined by comparing edge weights in an observed network to expected edge weights in a "null network"---is application-dependent. We differentiate between "null networks" and "null models" in our discussion of modularity maximization, and we highlight that the same null network can correspond to different null models. We then investigate a multilayer modularity-maximization problem to identify communities in temporal networks. Our multilayer analysis only depends on the form of the maximization problem and not on the specific quality function that one chooses. We introduce a diagnostic to measure \emph{persistence} of community structure in a multilayer network partition. We prove several results that describe how the multilayer maximization problem measures a trade-off between static community structure within layers and larger values of persistence across layers. We also discuss some computational issues that the popular "Louvain" heuristic faces with temporal multilayer networks and suggest ways to mitigate them.Comment: 42 pages, many figures, final accepted version before typesettin

    Core-periphery structure in directed networks

    Get PDF
    Empirical networks often exhibit different meso-scale structures, such as community and core–periphery structures. Core–periphery structure typically consists of a well-connected core and a periphery that is well connected to the core but sparsely connected internally. Most core–periphery studies focus on undirected networks. We propose a generalization of core–periphery structure to directed networks. Our approach yields a family of core–periphery block model formulations in which, contrary to many existing approaches, core and periphery sets are edge-direction dependent. We focus on a particular structure consisting of two core sets and two periphery sets, which we motivate empirically. We propose two measures to assess the statistical significance and quality of our novel structure in empirical data, where one often has no ground truth. To detect core–periphery structure in directed networks, we propose three methods adapted from two approaches in the literature, each with a different trade-off between computational complexity and accuracy. We assess the methods on benchmark networks where our methods match or outperform standard methods from the literature, with a likelihood approach achieving the highest accuracy. Applying our methods to three empirical networks—faculty hiring, a world trade dataset and political blogs—illustrates that our proposed structure provides novel insights in empirical networks

    Mining the UK web archive for semantic change detection

    Get PDF
    Semantic change detection (i.e., identify- ing words whose meaning has changed over time) started emerging as a grow- ing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social sci- ence. However, several obstacles make progress in the domain slow and diffi- cult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine- grained temporal resolution, and quantita- tive evaluation approaches. In this work, we aim to mitigate these issues by (a) re- leasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000- 2013); (b) proposing a variant of Pro- crustes alignment to detect words that have undergone semantic shift; and (c) intro- ducing a rank-based approach for evalu- ation purposes. Through extensive nu- merical experiments and validation, we il- lustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain

    Pull out all the stops : textual analysis via punctuation sequences

    Get PDF
    Whether enjoying the lucid prose of a favourite author or slogging through some other writer’s cumbersome, heavy-set prattle (full of parentheses, em dashes, compound adjectives, and Oxford commas), readers will notice stylistic signatures not only in word choice and grammar but also in punctuation itself. Indeed, visual sequences of punctuation from different authors produce marvellously different (and visually striking) sequences. Punctuation is a largely overlooked stylistic feature in stylometry, the quantitative analysis of written text. In this paper, we examine punctuation sequences in a corpus of literary documents and ask the following questions: Are the properties of such sequences a distinctive feature of different authors? Is it possible to distinguish literary genres based on their punctuation sequences? Do the punctuation styles of authors evolve over time? Are we on to something interesting in trying to do stylometry without words, or are we full of sound and fury (signifying nothing)

    Local2Global : a distributed approach for scaling representation learning on graphs

    Get PDF
    We propose a decentralised “local2global” approach to graph representation learning, that one can a-priori use to scale any embedding technique. Our local2global approach proceeds by first dividing the input graph into overlapping subgraphs (or “patches”) and training local representations for each patch independently. In a second step, we combine the local representations into a globally consistent representation by estimating the set of rigid motions that best align the local representations using information from the patch overlaps, via group synchronization. A key distinguishing feature of local2global relative to existing work is that patches are trained independently without the need for the often costly parameter synchronization during distributed training. This allows local2global to scale to large-scale industrial applications, where the input graph may not even fit into memory and may be stored in a distributed manner. We apply local2global on data sets of different sizes and show that our approach achieves a good trade-off between scale and accuracy on edge reconstruction and semi-supervised classification. We also consider the downstream task of anomaly detection and show how one can use local2global to highlight anomalies in cybersecurity networks

    How to Data in Datathons

    Full text link
    The rise of datathons, also known as data or data science hackathons, has provided a platform to collaborate, learn, and innovate in a short timeframe. Despite their significant potential benefits, organizations often struggle to effectively work with data due to a lack of clear guidelines and best practices for potential issues that might arise. Drawing on our own experiences and insights from organizing >80 datathon challenges with >60 partnership organizations since 2016, we provide guidelines and recommendations that serve as a resource for organizers to navigate the data-related complexities of datathons. We apply our proposed framework to 10 case studies.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmar

    Ramsey theory

    No full text
    Ramsey theory is a field of mathematics dating back to approximately 100 years. It intersects with various branches of mathematics, such as combinatorics, number theory, geometry, topology and set theory [16]. Loosely speaking, Ramsey theory can be described as the study of structure which is preserved under partitions – an idea succinctly captured by the statement “complete disorder is impossible” [6, 10]. In this essay we explore Ramsey’s theorems, some of the core results underpinning Ramsey theory and dealing with invariant substructures under finite set partitioning. We then discuss some extensions of these ideas in the case of infinite set partitioning

    Community structure in temporal multilayer networks, and its application to financial correlation networks: Community structure in temporal networks

    No full text
    Many real-world applications in the social, biological, and physical sciences involve large systems of entities that interact together in some way. The number of components in these systems can be extremely large, so some simplification is typically needed for tractable analysis. A common representation of interacting entities is a network. In its simplest form, a network consists of a set of nodes that represent entities and a set of edges between pairs of nodes that represent interactions between those entities. In this thesis, we investigate clustering techniques for time-dependent networks. An important mesoscale feature in networks is communities. Most community-detection methods are designed for time-independent networks. A recent framework for representing temporal networks is multilayer networks. In this thesis, we focus primarily on community detection in temporal networks represented as multilayer networks. We investigate three main topics: a community-detection method known as multilayer modularity maximization, the development of a benchmark for community detection in temporal networks, and the application of multilayer modularity maximization to temporal financial asset-correlation networks. We first investigate theoretical and computational issues in multilayer modularity maximization. We introduce a diagnostic to measure persistence of community structure in a multilayer network partition and we show how communities one obtains with multilayer modularity maximization reflect a trade-off between time-independent community structure within layers and temporal persistence between layers. We discuss computational issues that can arise when solving this method in practice and we suggest ways to mitigate them. We then propose a benchmark for community detection in temporal networks and carry out various numerical experiments to compare the performance of different methods and computational heuristics on our benchmark. We end with an application of multilayer modularity maximization to temporal financial correlation networks
    corecore